Jan-v2-VL is a vision-language model with 8 billion parameters, designed specifically for long-term, multi-step tasks in real software environments (such as browsers and desktop applications). It combines language reasoning with visual perception, can follow complex instructions, maintain intermediate states, and recover from minor execution errors.
Multimodal
Gguf